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ABSTRACT 

Several approaches exist to model gravitational lens systems. In this study, we apply global op- 
timization methods to find the optimal set of lens parameters using a genetic algorithm. We treat 
the full optimization procedure as a two-step process: an analytical description of the source plane 
intensity distribution is used to find an initial approximation to the optimal lens parameters. The sec- 
ond stage of the optimization uses a pixelated source plane with the semilinear method to determine 
an optimal source. Regularization is handled by means of an iterative method and the generalized 
cross validation (GCV) and unbiased predictive risk estimator (UPRE) functions that are commonly 
used in standard image deconvolution problems. This approach simultaneously estimates the optimal 
regularization parameter and the number of degrees of freedom in the source. Using the GCV and 
UPRE functions we are able to justify an estimation of the number of source degrees of freedom found 
in previous work. We test our approach by applying our code to a subset of the lens systems included 
in the SLAGS survey. 

Subject headings: gravitational lensing: strong — methods: numerical 



1. INTRODUCTION 

Methods for modeling gravitational lens systems are 
divided into a broad dichotomy between schemes that 
require a parameterized analytical model for the source 
intensity distribution, and schemes that assume only a 
pixelated source with no underlying model. Methods 
that parameterize the source intensity distribution are 
often quite easy to implement, but assume a priori knowl- 
edge of the source structure. Schemes that make use of 
a pixelated source are generally more complex, but offer 
greater flexibility since no parametric form is assumed 
for the source. This paper makes use of both parameter- 
ized and pixelated source models, exploiting the benefits 
provided by each. 

Lens inversion schemes based on analytical source 
models assume an intensity distribution /s(/3) in the 
source plane /3. A model of the lens density is then used 
to calculate a ray-tracing from the image plane 9 to the 
source plane using the thin lens equation 



(3{e) = e-a{e) 



(1) 



where a{6) is the deflection angle field calculated 
from the projected le ns potential (ISchneideii 119851 : 
[Schneider etall Il992t IPetters et all I2001D . Since 
xavitational lensing co nserves surface brightness 
Kavser fc Schrammlll988[ ). the lensed image intensity is 
easily found by 

m = him) (2) 

for an assumed parametric source intensity function Is- 
The resulting lensed image I{0) is then convolved with 
a point spread function (PSF) and compared with the 
data. The statistic is minimized over the combined set 
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of lens and source parameters using non-linear methods 
for parameter se arch and glo bal optimization. 

Sersic profiles (|Sersiclll968l ) are widely used for galaxy 
scale sources, as defined by 



Is{r) = /oexp{-fc(n)[(r/ro) 



1]}, 



(3) 



which assumes intensity /q at the scale length tq and 
shape index n. The shape index controls the curvature of 
the profile, w here most galaxies hav e profiles with 0.5 < 
n < 10. The Ide VaucouleursI (jl948D profile is recovered 
for n — A, and the exponential disk is found by setting 
n = 1. The scaling factor k{n) is used to normalize the 
distribution such that half the total luminosity is within 

Due to their fiexibility and simple physical interpre- 
tation, Sersic functions arc commo nly used to model 
lensed sources (.Marshall et al. 2007; iBolton et all 120081: 
iBrewer and Lewis! 1201 If T However, more complicated 
analytical source functions have also been used to ap- 
proximate the varied and complex morphologies of galax- 
ies and c an include hund reds of parameters in extreme 
cases (Tv son et al.lll99"8l ). In general, analytical mod- 
els are used because they are typically fast to evaluate 
and provide an intuitive understanding of the resulting 
source. 

As useful as analytical models are, they may not be 
flexible enough to describe complex sources and may bias 
the lens parameters during minimization to compen- 
sate for the artificial constraints imposed by their as- 
sumed analytical form. Pixelated source models were 
introduced to move past this limitation. This approach 
represents the source plane intensity as a set of basis 
functions, each having an adjustable parameter that rep- 
resents the surface brightness of the source plane at a 
given pixel. The semilinear method treats each pixel 
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as a basis function and minimizes the mismatch be- 
tween model and data by manipula ting the brightness of 
each source pix el indepen dently (iWarren fc Dvell2003l : 
iTreu fc Koop^ ans 2004; Su vu et all 1 20061 1 

The semilinear method divides the lens modeling prob- 
lem into a non-linear "outer loop" problem that solves 
for lens parameters, and an "inner loop" problem that 
solves for the pixelated source, assuming a fixed set of 
lens parameters. An important benefit of this approach 
is that the inner loop problem is linear and therefore 
does not require complicated nonlinear optimization rou- 
tines. The blurring and lensing effects are expressed 
by the matrix / = BL. The lensing matrix L en- 
codes the ray tracing operation from the image plane to 
the source plane and forms the lensed image of a given 
source brightness distribution. In this work we make 
use of a bilinear interpolation scheme, where the cen- 
ter of each image pixel is traced to a position on the 
source plane using the lens equation. Then the bright- 
ness of an image pixel is found by a weighted average of 
the four source pixels that enclose each ba ck-traced ray 
(jTreu fc Koopmansll20dl :lKoopmansl |2005D . This choice 
is not unique and many different kinds of interpolation 
schemes have be en studied in the lit erature including 
nearest neig hbor (IWarren fc Dye I2003D . adaptive source 
pixel til ings (iDve fc WarrenI 2005 |and delaunay triangu- 
lations (jVegetti fc KoopmansI 120091 ). We plan on study- 
ing the effects of a variety of such interpolation schemes 
in future work. 

The blurring matrix B describes the effect of the PSF 
on the resulting lensed image. By minimizing the 
statistic with respect to the source plane intensities Sj, 
the least-squares form of the problem is exposed: 



F^Fs 



F^d. 



(4) 

where F is the lens matrix divided by the errors in the 
data, Fij = fij/cJi, and s is a "flattened" image vec- 
tor containing the intens i ties of the sourc e plane pix- 
els (jWarren fc Dvd [200l iKoopmansI [20051) . The vec- 
tor di = di/ai is the data vector d normalized by the 
noise Ui. This type of problem has been well studied in 
the context of th e standard image dcconvol ution problem 
(IGolub. Heath fc Wahba 1979: Vogcl 19871 : IHansen|[l99l 
INagv et al.ll2002n . which seeks to remove the distortion 
introduced by a blurring function (PSF). 

In general, the solution of Equation |4] requires regu- 
larization to stabilize the inversion of the system matrix 
F"^ F (iKoopmans .2005) . The modified matrix is then 
given by 

M = F^F + XH^H, (5) 

where if is a regularization matrix and A a multiplier 
that controls the amount of regularization added to the 
problem. The simplest case, zeroth order regularization, 
assumes that H = I. This scheme regularizes the prob- 
lem by seeking the solution s that has minimal inten- 
sity over the source plane. Higher order regularization 
schemes are also commonly used, such as curvature reg- 
ularization that uses the second order derivatives of s 
to smooth the solution by minimizing the curvature over 
the source plane. Regularization schemes seek to impose 



physicality constraints on the source intensity to select a 
smoothly varying and physically realistic solution from 
the many alternatives that exist to solve the ill-posed 
system. Linear regu l arizat ion schemes were studied in 
depth bv lSuvu et al.l (|2006( ). 

Following our previous work (jRogers fc Fiegd[20TTal) 
we use the Qubist Optimization Toolbox ( Fiegd 120101) 
to find the nonlinear lens parameters varied in the outer 
loop of the lens inversion problem. The Qubist Tool- 
box contains several non-linear global optimization rou- 
tines including Ferret, an advanced genetic algorithm 
(GA), and Locust, a particle swarm optimizer (PSO). 
In the inner loop, we solve the least squares problem of 
the sernilinea r method using Krylov subspace methods 
(|Biorcklll996( ). Krylov subspace methods are well known 
in the image deblurring community and have been stud- 
ied in th e context of dcconvolution problems at length 
(|Hansen|[T99 4: Nag y et al. 2002). This class of optimiza- 
tion routines include the conjugate gradient method for 
least squares problems (CGLS) and the steepest descent 
(SD) method. Krylov methods are attractive because 
they naturally regularize ill-posed problems and are ef- 
ficient at solving large scale problems. We previously 
studied the performance of the GA and PSO methods on 
test pr oblems using simulated lens data ([Rogers fc Fiegd 
l2011al) . In that work we found that the GA explored 
the parameter space more thoroughly than the PSO, al- 
though the PSO was slightly faster to converge. 

In this work, we will explore parameter selection meth- 
ods to determine an appropriate value for the regular- 
ization constant in the semilinear method, and use the 
Ferret GA with our lens code to model data from the 
SLAGS survey. We use a two stage approach to the lens 
modeling problem: we begin the optimization with an- 
alytical sources to estimate the approximate position of 
the globally optimal lens parameters, and switch to a 
pixelated source for further model refinement once the 
global optimizer has converged. 

2. GRAVITATIONAL LENS SOURCE DECONVOLUTION 

The semilinear method with regularization describes 
gravitational lens modeling in the context of a least 
squares problem, where we seek a vector s that mini- 
mizes 
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\Fs - d\ 



X\\Hs\ 



(6) 



The first term in this sum is the between the model 
and observed images, while the second term quantifies 
the strength of the regularization. In general, gravita- 
tional lensing produces multiple images, so is a rect- 
angular N X M matrix {N > M) , where N is the number 
of image pixels involved in the inversion and M the num- 
ber of source pixels. 

The most direct method to solve the least squares prob- 
lem is to decor npose F using the singu lar value decom- 
position ('SVD: iGohIb fc Reinschl[T970l ). 



F = VEV^ , 



(7) 



where S is an iV x M diagonal matrix composed of a 
set of non-zero, non-increasing elements Sjj — Vj such 
that vi > V2 >,...,> vm- These diagonal elements are 
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the singular values of F, defined as the eigenvalues of 
F'^ F and FF'^ , both of which produce identical sets of 
no n- vanishing eigenvalues. The U and V matrices are 
orthogonal {N x N) and (M x M) matrices respectively. 
We denote the columns of these matrices as Ui and Vj, 
the left and right singular value basis vectors. These 
vectors are the set of eigenvectors of the square matrices 
FF^ (TV X N) and F^ F {M x M). 

It is straightforward to write the solution to the system 
defined by Equation ^ using the SVD in the absence of 
regularization when A = in Equation ([6|), expressing 
the solution as a sum over the basis vectors Vi: 



{F^F)-^F^d- 



■J 



(8) 



In this equation, we have written the SVD in terms of 
sums over the orthogonal columns of U and V, and the 
entries of S~^, which is simply defined as an M x di- 
agonal matrix with non-zero elements Tj'J^ = 1/vj. The 
SVD allows us to express s as an expansion over the 
orthogonal basis Vj . 

The matrix F will have small singular values such that 
Vj — > if the problem is ill-posed. These vanishingly 
small singular values cause the corresponding terms in 
Equation (|8]) to become large. The solution s may then 
become corrupted by the noise contained in the data vec- 
tor d. This amplification of noise due to small singu- 
lar values is the reason why regularization is required in 
Equation 

The simplest regularization scheme simply truncates 
the terms that arise from small singular values from the 
sum in Equation ([8|). Since the singular values form 
a non-increasing set, this corresponds to discarding all 
terms j > k, where k is the truncation threshold. Early 
termination of the sum removes the high frequency com- 
ponents of the basis vectors Vj. This is known as the 
truncated singular value decomposition, or TSVD: 



J' 



(9) 



where are a set of constants called the filter fac- 
tors that are equal to 1 for terms j < k and for all 
terms higher than this threshold. However, terminat- 
ing the summation abruptly may discard too much high 
frequency information. A more general choice is to grad- 
ually decrease the contribution of small singular value 
terms to the sum. This approach is called Tikhonov reg- 
ularizat ion, which amou nts to a modification of the filter 
factors ()Tikhonovlll963D : 



(10) 
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where A is the regularization constant. Note that Cf 
when I/? 3> A, which occurs for small j. When Vj is 
smaller than the regularization constant (large j), the 
filter factors damp the corresponding terms of Equation 
^ as (f>j ~ '^j/^- Thus, A must be assigned a value 



between the maximum and minimum singular values vi 
and i/N. This regularization scheme corr esponds to set- 
ting th e matrix if = 7 in Equation ([6]) (iTwomevI 119631 : 
iTikhon ov 1963). Regularization modifies the system that 
we are attempting to solve so that the inverse of Equation 
([7]) becomes 



{F^F + Xiy^F^ = VT,-^ *J7^ 



(11) 



where $ is the TV x TV diagonal matrix of filter factors 
with diagonal elements <^i>M — 0. 

Note that these schemes do not specify how much 
regularization should be included for a given problem. 
The strength of the regularizing effect in Tikhonov reg- 
ularization is controlled by the value of the regulariza- 
tion constant A and by the truncation index k in the 
TSVD scheme. The regularization constant is a "hyper- 
parameter" which must be selected a priori. Fortunately, 
several methods exist to estimate the optimal regulariza- 
tion parameter for a given problem (Hansen 2010). 

2.1. Regularization Parameter Selection Methods 

A widely used technique to se lect a regular ization pa- 
rameter is the L-curve criterion (|Hansenlll99l . which we 
used in [Rogers fc Fiegj (j2011aD . The L-curve is a plot of 
the residual versus the regularization term that appears 
in Equation ([5]), and is named for the characteristic shape 
of the resulting curve. The L-curve is parameterized by 
the regularization constant A and the position on the 
plot with the largest curvature represents a balance be- 
tween the image and regularization term (jPress et al.l 
|2007| ). This does not imply that an optimally regularized 
solution has a reduced exactly equal to 1, but should 
trade-off between the amount of source structure and the 
quality of the fit. 

As an alternative to the L-curve, another well-known 
regularization selection method is generalize d cross val- 
idation fGCV: iGoIub. Heath fc Wahbal [19791) . This is a 
statistical method that aims to minimize the mean square 
error, — where S0 is the optimally regularized 

solution. We now define the GCV function: 



G(A) 



\d-Fs\ 



trace(/jv--FF7i)2^ 



(12) 



where In the N x N identity matrix. This equation is 
based on statistical arguments that consider a solution to 
be properly regularized when it can predict elements of 
the data vector that have been omitted (Hansen 1997) . 
The trace term in the denominator can be dramatically 
simplified given the definition of F'^^ in terms of the SVD 
(Equation (ITlT) '). The denominator of the GCV function 
becomes: 

trace(JAr - FFS'^ *t/^) = trace(JAr - U^U^) (13) 

using the SVD expansion of F (Equation ([7])). With the 
orthogonality of TJ and the diagonality of ^, the trace 
term simplifies dramatically. We are left with trace(/Ar — 
$) such that 



trace(7Ar - FF^'^) = N 



(14) 
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This sum represents the number of degrees of freedom 
in the problem. Putting these arguments together, the 
GCV function becomes 



G(A) 



\Fs, 



d\ 



(15) 



IWahbal ()1977[ ) showed that when the errors in the data 
vector are unbiased white noise with covariance matrix 
C = (J'^In , and satisfy the discrete Picard condition 
(|Kresslll989l : lEngl et aLl(l996l) . the minimum of the GCV 
function corresponds to a regularization parameter that 
is a good estimator of the optimal A and approaches this 
value asymptotically as TV — > cxd. The convergence re- 
sults between the true solution of a test problem and 
the GCV-regularized solution have also been thoroughly 
explored when th ese conditions are not satisfied (,Vogeli 
[19871: ILukaslll99l . 

The denominator of the GCV function has special sig- 
nificance for gravitational lens modeling. Lens modeling 
schemes that pixelate the source plane have been criti- 
cized for relying on regularization since smoothing causes 
the number of degre es of freedom in the so urce to be- 
come undetermined (|Kochanek et al.ir2004D . iSuvu et all 
(|2006f ) give an estimate for the number of effective de- 
grees of freedom based on Bayesian arguments. In that 
work the authors construct a variety of possible expres- 
sions for the number of degrees of freedom (NDF), and 
chose NDF = N — j with N the number of image pixels, 
and 

M 9 

(16) 



' / V 2 1 \ ' 

^ t'f H- A 

which corresponds to Tikhonov (zcroth order) regular- 
ization when H — I in Equation ([5]). This expression 
was selected as the correct number of degrees of freedom 
based on an empirical test that produced a reduced 
nearest to 1 for a simulated test problem (see Table 1, 
ISuvu et all [20061) ■ In fact, 7 is simply the sum of the 
filter factors from Tikhonov regularization. The GCV 
function gives a statistical argument for choosing this 
value based on the nature of an optimally regularized 
source inversion. 

In addition to the GCV me thod, the unb iased pre- 
dictive risk estimator (UPRE; [Mallows! I1973D has also 
been used to select the regularizati o n parameter in 
deconvolution problem s ([Vogell 120021 : iBardslevI 120061 : 
iLin fc Woh lberg 20081. The UPRE method was initially 
developed for model selection in linear regression, though 
variations of this approach have been subsequently ap- 
plied to the solution of inverse problem s. A concis e 
derivation of the method can be found in [\^geil ([200l : 
here we simply define the UPRE function 



C/(A) = I |Fs0 -d\\^ + 2trace FF 



N. 



(17) 



In analogy with the denominator of the GCV function, 
we identify the trace term with the sum of the filter fac- 
tors, 7. The optimal regularization parameter is chosen 
as the value of A that minimizes C/(A). 

Iterative methods complicate the calculation of the 
GCV and UPRE functions since we do not know the filter 



factors a priori, nor do we have the decomposition of i^, 
which can be expensive due to the sparsity and size of the 
matrix. In this case, we estimate t he denominator by a 
Monte Carlo method (|Girardlll989[ ). This allows two ad- 
vantages: we approximate the number of source degrees 
of freedom while simultaneously finding an approxima- 
tion to the optimal regularization parameter. Using an 
iterative method, we find these quantities as we solve for 
the source intensity distribution. This is accomplished 
by running iterations on both d and d, where the vector 
d is a discrete white-noise vector composed of elements 
that are ±1 with equal probabili ty, as commonly used in 
the image processing literature ('IIut chinsonlll990l iVogell 
[2002; Bardslcv 2006; Favati et al. 201^. 



We form the product d f, where r 



d - F~s, 



This quantity approximates the denominator of the GCV 
function and therefore the number of deg rees of freedom 
in the iterative problem (fGirardl 119891 : iHanse n[ [T99l . 
This calculation requires twice the work during the it- 
erative process and therefore effectively doubles the exe- 
cution time of the code to solve for the source intensity 
function. However, since we generally need only a small 
number of iterations to solve a gravitational lens sys- 
tem, this extra work is acceptable due to the amount 
of information the calculation provides. By using this 
Monte Carlo estimate, we find the number of effective 
degrees of freedom and evaluate Equation (IT^ at each 
iteration. Once we have evaluated an arbitrary number 
of iterations, we find the minimum of the GCV function 
and select the critical number of iterations necessary to 
produce an optimally regularized source. The estimate of 
the number of source degrees of freedom and the residual 
at this iteration are used to evaluate the reduced of 
the lens model. A similar procedure can be used to eval- 

~T 

uate the UPRE function using d Fs^ to approximate 7, 
t he trace term in Equatio n PD) . 

[Rogers k, Fieed ([2011a[ ) explored the L-curve method 
for the selection of regularization parameters in gravita- 
tional lens modeling, arguing that the L-curve provides 
a useful parameter selection method that yields results 
which are easy to interpret. However, using this selection 
criterion can be difficult due to the curvature calculation, 
which requires spline fitting of the points on the L-curve 
and the curvature of the resulting smoothed curve. This 
calculation is non-trivial and results can be somewhat 
sensitive to the details of the fitting procedure. The 
GCV and UPRE functions require more involved sta- 
tistical arguments but provide more robust and reliable 
selection methods, since the functions are calculated at 
each iteration simultaneously with the linear optimiza- 
tion. We find that the GCV and UPRE methods produce 
results that are consistent with one another, indicating 
that both can be used effectively to determine the op- 
timal termination condition for the iterative solver. We 
prefer evaluating the GCV and UPRE functions to the 
L-curve method for the reasons outlined above and focus 
on these parameter selection routines in this study. 



3. THE SLAGS SURVEY 
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The Sloan Lens ACS Survey (SLAGS) was conducted 
using the Hubble Space Telescope ACS instrument 
(jBolton et alj [20061) . The survey has detected 70 early 
type galaxies with definite lensed sources in the redshift 
range z = 0.06 to z = 0.33. The candidate systems were 
chosen by spectral analysis of galaxies in the luminous 
red galaxy (LRG) and MAIN samples of the Sloan Dig- 
ital Sky Survey (SDSS; http://www.sdss.org). Potential 
gravitational lens candidates were discovered when two 
distinct redshifts were seen within a single SDSS spec- 
trum. We use reduced SLAGS data from Bandara et al.l 
(j2009), who modeled the surface brightness of the E/SO 
lens galaxies using the sum of two components, a Sersic 
bulge (Equation (jSj) and an exponential disk. The PSF 
model from the AGS library was used in the surface 
brightness subtract ion, making use of the GIM2D code 
(jSimard et ar2002^ . AU of the dat a are F814W /-band 
images. See B andara et al.l (|2009l ) for more details on 
the reduction procedure. 

4. RESULTS 



iBolton et all (|2008[ ) modeled the SLAGS gravitational 
lens systems using analytical Sersic and Gaussian source 
models to describe the intensity distribution in the source 
plane. A subset of 15 of these systems were further in- 
vestig ated using the semilinear method (jKoopmans et al.l 
[200^. We focus on six of the SLAGS lens systems in 
this paper, and plan to model more of them in the fu- 
ture. Since they have been well studied using several 
established methods, the SLAGS galaxies provide a use- 
ful consistency check for verifying the results of our lens 
modeling code. 

The SLAGS systems are modeled using a singular 
isothermal elliptical mass density (SIE). We define a dis- 
tance -0 = \/ qx^ + y'^/q, such that the defiection angle 
a = {ax, Oy) is given by 



b 

X — — tan 
If 

b 

— — tanh 
If 



qfX 

IfV 





(18) 



(19) 



with qf = ^1/q — q, and Einstein radius b. In the limit 
g — >■ 1, the model corresponds to a singular isothermal 
sphere with Einstein radius 



47r- 



(20) 



where Uy is the velocity dispersion, c the speed of light, 
Dds the distance between the deflector and the source, 
and Ds the distance between the observer and the source. 
These distances depend on the corresponding redshifts 
Zd and Zs and determine angular diameter distances that 
depend on the cosmological model used. We assume a 
standard cosmology with Hubble constant = 70 km 
s~^ Mpc~^, matter density fio = 0.3 and cos r nolog i- 
cal constant Ao = 0.7. Following IBolton et al.l (j2008D . 
we adopt the intermediate-axis normalization of the SIE 
(jKormann. Schneider and Bartclmann 1994). This nor- 
malization fixes the mass within given isodensity con- 



tours for constant 6, and is implemented in the deflec- 
tion angles above. iKoopmans et al.l (|2006[ ) showed that 
the SIE is a useful model of early type isolated galax- 
ies because the lens density ellipticity and orientation 
were found to align well with the surface brightness of 
the SLAGS lens galaxies, indicating that light closely 
traces mass for these systems. No significant external 
shear was found t o imp rove the fits. We therefore follow 
IKoopmans et al.l (j2006D and adopt the SIE as a good 
model to represent isolated the early type E/SO SLAGS 
lens galaxies. 

We cropped out the residuals left over from the surface 
brightness subtraction of the lens in the F814W SLAGS 
data, and cropped the field of view to the region of in- 
terest, but performed no rebinning or other manipula- 
tion of the data in any way. Our lens models use the 
same AGS PSF that was used for the lens galaxy sub- 
traction. Althoug h it is known that th e AGS PSF is po- 
sition dependent ([Bandara et al.|[2009t ). we simplify our 
treatment by assuming a constant PSF over the region of 
interest, though we have previously developed methods 
to include spatially variant P SFs in the gravitational lens 
problem (Rogers & Fiegc 2 01 Ibl ) . We output the sigma 
image from the GALFIT code (|Peng et al.l l2010( ) that 
corresponds with the region of interest to estimate the 
errors on the image plane. We emphasize that the main 
focus of this work is to study the regularizing properties 
of the GGLS method on the derived solutions with the 
GGV and UPRE schemes to select the optimal level of 
regularization. 

Our analysis initially solves for the parameters of an 
analytical source model, which we use as an approximate 
solution to a more refined model that uses a pixelated 
source. We start by treating the source plane intensity 
distribution as a sum of Sersic profiles, using the same 
number of a nalytical sour c e corn ponents to model each 
system as in IBolton et al.l (|2008[ ). The SIE lens is used 
to find the lensed image of the source plane, which is 
convolved with the appropriate AGS PSF. We sear ch for 
the g lobal minimum of x^, using the Ferret GA (iFieg d 
12010( 1 to fit both the lens density and source parameters. 
Once we find an approximation to the global minimum, 
we select a volume of lens parameter space in the neigh- 
borhood around the best fit lens model. Noting that 
Ferret is used predominantly as a bounded optimizer, 
this neighborhood becomes the search volume in the next 
step of our method, which replaces our analytical source 
model with a pixelated source. The optimization of a 
pixelated model requires a new Ferret run, which be- 
gins with the search volume found in the previous step 
populated initially by random lens models. Normally, 
we expect the lowest model to reside within this vol- 
ume; however, we configure the optimizer using "soft" 
boundaries, which allows the GA to move outside of the 
predefined search volume if the initial approximation is 
bounded too tightly. This option allows Ferret to expand 
the search space if a large fraction of the GA population 
occupies positions close to the boundaries of the param- 
eter space. In general, the lens parameters of our pix- 
elated sources were found to reside within these search 
volumes and agree well with the analytical approxima- 
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Table 1 - Lens Model Parameters 



SDSS System 




Zs 


cr-v (km s ■"-) 


<? 


b(") 


J0037-0942 


0.1955 


0.6322 


286 


0.825 


1.55 


J0216-0813 


0.3317 


0.5235 


351 


0.783 


1.18 


J0737+3216 


0.3223 


0.5812 


291 


0.661 


0.99 


J0912+0029 


0.3240 


0.1642 


341 


0.561 


1.59 


J0956+5100 


0.2405 


0.4700 


318 


0.620 


1.33 


J1402+6321 


0.2046 


0.4814 


292 


0.843 


1.34 



TABLE 1 

Lens model parameters for a subset of the SLACS systems 

FOUND BY THE FeRRET GA WITH SOURCE RECONSTRUCTION BY 
THE CGLS ROUTINE. 



tions. We compute our best refined model by optimizing 
the lens and source plane parameters using a pixelated 
source and regularizing iteration selected by the GCV 
and UP RE functions. 

In addition to the regularizing effect of truncated it- 
eration, we have found that enforcing non-negativity in 
the source solutions dramatically improves the quality of 
the reconstruction and tends to further reduce remaining 
structure in the image residuals. As a final step, we have 
modeled the set of best-fit lens models with the modi- 
fied residual no r m steepest descent alg o rithm (MRNSD ; 
[Kaufman 1991; INaev fc Strakol [2000t IBardslevI 120061) . 
This algorithm is a bounded SD optimization routine 
that seeks sources with Sj > 0. In practice we have 
found that the MRNSD method can produce residuals 
which decrease in a step-like manner, making the de- 
termination of the minimum difficult for the GCV and 
UPRE f unctions. Th i s zig- zag behavior has also been 
noted by iFavati et al.l (|201Clf) in the context of the stan- 
dard deconvolution problem. 

Combining analytical and pixelated sources greatly im- 
proves the efficiency of the search, since analytical models 
can be evaluated very quickly. Searching using pixelated 
sources is a more intensive process, and time can be saved 
by adopting the semilinear method only once we have a 
good approximation to t he lens parameters corre spond- 
ing to the minimum x^- iRogers fc Fiegd ()2011al ) noted 
that a set of trivial pixelated solutions exist when global 
optimization methods are used to model lensed systems. 
These trivial solutions are found when the effect of the 
lens is reduced, resulting in sources that closely resemble 
the data. The two-stage optimization process is useful 
since the initial analytical sources are generally not as 
flexible as pixelated sources, and thus provide a natu- 
ral method for avoiding exploration of the trivial regions 
of the parameter space. The analytical stage of the al- 
gorithm terminates once the GA has converged and we 
no longer see improvement in the population. Typically, 
convergence requires only 50 — 100 generations using a 
population of 300 individuals for the analytical portion 
of the optimization, and approximately 100 iterations for 
the second semilinear optimization stage. 

The final velocity dispersion ay, axis ratio q, and Ein- 
stein radius b of our models are shown in Table[T] The re- 
duced data, model image, recovered non-negative source 
and residuals are shown in Figures [1] and [21 Our results 
agree with the SLACS lens models for each system to 
within 3% in velocity dispersion ay. Both the pixelated 



and analytical source plane intensity distributions agree 
with one another in all cases. Our lens modeling r esults 
agree with the parameters in iBolton et al.l ()2008D very 
well. The reduced statistic for all systems is very 
close to unity. 

The Einstein radius (velocity dispersion) and elliptic- 
ity of SDSS J0737 -f 3216 are similar to the model from 
IBolton et al.l (I2008D. and shar e a velocity dispersion sim- 
ilar to [Marshall et al.l (|2007| ). However, the recovered 
ellip ticity is smaller than both iKoopmans et al.l (|2006f ) 
and [Marshall et al.l (j2007[ ). We found a lower elliptic- 
ity from both the initial and analytical source fit and by 
pixelated source modeling. To illustrate the difference 
between analytical and pixelated source lens models, we 
show the lens parameter space in Figure [H Points in 
this figure are shaded according to confidence interval, 
and demonstrates that the la, 2a and 3cr confidence re- 
gions are larger for the pixelated source than the ana- 
lytical source. This shows that a more flexible pixelated 
source can broaden the error bars on the lens param- 
eters when compared to an analytical description. We 
have observed this result for all the lens systems that we 
studied. The SDSS J0912 + 0029 data is heavily contam- 
inated with noise, although it is adequately fit by our 
GCV and UPRE regularized solution, and our analyti- 
cal and pixelated sources agree. Of all of the systems, 
SDSS J0956 + 5100 and SDSS J0737 + 3216 show the 
most structure in the residuals, although the magnitude 
of these residuals are small (< 1%) compared to the in- 
tensities of the image pixels. In fact, the largest sys- 
tematic effects present in most of the residual images in 
Figures [Hand [2] are produced from the subtraction of the 
intensity profile of the lens galaxy. The GCV and UPRE 
selected iterations are identical for the lens parameters 
above, and have been observed to generally differ by only 
a few iterations. 

Overall we are encouraged by our results since we were 
able to recover the SLACS lens parameters and gen- 
eral source morphologies. The results could be improved 
slightly by including a final local optimization step to 
'polish' the results returned from the GA. We did not 
detect any parameter space degeneracies except for the 
expected position angle degeneracy that leaves the solu- 
tion unchanged when the elliptical mass distribution is 
rotated by 180°. 

We have used the GCV and UPRE approaches with 
both CGLS and SD, and find similar results for both 
of these algorithms. The SD routine takes much longer 
than the CGLS method to converge, although it is in 
general a more stable approach to regularization and has 
been suggested as a superior routine for image deblur- 
ring prob lems due to its reduce d sensitivity to stopping 
criterion ()Nagv fc Palmed I2003D . The best fit MRNSD 
solutions are found by comparing the solution at each it- 
eration to the optimally regularized CGLS solution. This 
comparison minimizes 

lUCGLS _ „MRNSD|| 

Z - Efc II (21) 

||2,CGLS|| ^^-^^ 

where x'~^'^^^ is the optimally regularized CGLS solution 
and x^^^^^ the non-negative solution at the fcth itera- 
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tion of the MRNSD algorithm. In general the MRNSD 
residuals appear smoother than the residuals of the 
CGLS models. This is due to the reconstruction of back- 
traced noise present in the CGLS solutions. The filter 
factors of the CGLS method are given by a recursion re- 
lation that depends on all of the singular values (Hansen, 
I2OIOI) . Even though CGLS tends to suppress high fre- 
quency noise at the beginning of the optimization pro- 
cess, the high frequency components are not completely 
damped out at any given iteration and build up over the 
course of a run. Hence, even the optimally regularized 
solution still contains some high-frequency components 
that correspond to back-traced noise. The MRNSD al- 
gorithm seems to be more robust to the propagation of 
high-frequency noise in the recovered non-negative solu- 
tions, thus producing images that are naturally smoother 
than the corresponding CGLS sources. 

Regularization by truncated iteration in the context 
of Krylov optimization is the simplest of many regular- 
ization methods that can be used. Truncated iteration 
regularization produces solutions (figures [T] and [2]) which 
are less smooth than the se cond order (cu rvature) regu- 
larization used in Koopman s et al.l (l2006f). It has been 
suggested that the LSQR algorithm (jBiorclj 119961 ) can 
generally accomfffmodate more complicated regulariza- 
tion schemes with increased numerical stability when the 
system is poorly conditioned. The LSQR routine is an it- 
erative Krylov subspace method that s olves least-squares 
probl ems using QR decomposition (jPaige fc Saundera 
119821 ). We previously tested LSQR in the context of 
gravitational lens modeling using s imulated data with 
the L-curve (jRogers fc Fieed r2011aD , and suggest that 
this scheme may provide a higher level of control over 
regularizing effects than the truncated iteration scheme 
used in this study. 

Figure [3] illustrates the regularizing behavior of the 
CGLS routine as a function of iteration k using SDSS 
J0216 — 0813 as an example. Iterative methods like 
CGLS produce a sequence of solutions, which we com- 
pare with the optimally regularized Baye sian solutions 
found by the method of iSuvu et al.l ([2006), making use 
of Equation ([2T|) . The top panels of this figure com- 
pare the solution at each iteration with intensity (ze- 
roth order), gradient (first order), and curvature (second 
order) regularization used with the semilinear method. 
The solution at earlier iterations is more heavily regular- 
ized, with a larger portion of high frequency components 
damped. These early iterations produce solutions that 
resemble the results of the semilinear method using gra- 
dient and curvature regularization terms. The solution 
from later iterations contains a larger number of high 
frequency components, which simulates the effect of us- 
ing zeroth order regularization in the semilinear method. 
Note that the truncated iterations of the CGLS method 
do not produce identical solutions to those found from 
the semilinear method, which is not surprising since the 
filter factors of the CGLS approach differ significantly 
from linear regularization methods. 

The GCV and UPRE functions are plotted on the lower 
left and center panels of Figure [H The selected stopping 
iteration is found from the location of the minima of these 
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TABLE 2 

A COMPARISON OF BAYESIAN SELECTED REGULARIZATION USING 
THE SEMILINEAR METHOD WITH THE GCV AND UPRE FUNCTIONS. 
Nimg AND Nsrc ARE THE NUMBER OF PIXELS IN THE IMAGE PLANE 
AND SOURCE PLANE. THE COLUMN MARKED "ReG." IS THE 

regularization type for each system (i, intensity; g, 
gradient; c, curvature), neff is the effective number of 
decjrees of freedom for the corresponding lens model in 
Table [T] and 7 gives the effective number of source 
degrees of freedom. the reduced is given in the 

rightmost COLUMN FOR EACH METHOD. THE GCV AND UPRE 
FUNCTIONS GAVE IDENTICAL RESULTS FOR EACH OF THE SYSTEMS 
USING THE CGLS METHOD. FOR THESE CALCULATIONS WE 
ESTIMATED N^ff AND 7 USING THE MON TE C ARLO APPROACH 
DESCRIBED IN SeCTI0n |2.1I 



functions, which correspond with one another in all of our 
test cases in Table [T] and are marked with circles. These 
selection schemes both favor the solutions from early it- 
erations. The lower right panel plots the L-curves using 
all three types of regularization. Note that in this case 
the L-curves from gradient and curvature regularization 
match the GCV and UPRE solutions. However, we of- 
ten observe that the L-curve can show false curvature 
maxima when iterative optimizers make rapid progress 
early in the run, leading to dramatically over regular- 
ized solutions. The GCV and UPRE functions avoid this 
problem. Combined with the statistical arguments used 
to derive the GCV and UPRE functions and the more 
robust behavior of these regularization parameter selec- 
tion methods, we conclude that the GCV and UPRE ap- 
proaches are more useful than the L-curve methodology 
for solving the least-squares source deconvolution prob- 
lem for gravitational lens systems. The comparison of 
our results with the optimally regularized Bayesian solu- 
tions is shown in Table [2] for each of the systems that we 
modeled. 

We have marked two additional points on the GCV 
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curve in Figure |31 These points signify over regular- 
ized and under regularized solutions. The sources cor- 
responding to the solution of the SDSS J0216 — 0813 sys- 
tem at these iterations are shown in Figure |4] using both 
CGLS and MRNSD algorithms. As shown in this figure, 
the over-regularized solutions are over-smoothed, and the 
under-regularized solutions include too many high fre- 
quency components. The corresponding MRNSD solu- 
tions were found by terminating iterations when Equa- 
tion is minimized. Furthermore, note the presence 
of back-traced noise in the reconstructions using CGLS, 
while this noise is effectively suppressed in the MRNSD 
solutions. 

In general, the difficulty in making use of any regular- 
ization parameter selection scheme is the need to evalu- 
ate each model using a range of regularization constants. 
This process can be expensive if the time for each eval- 
uation is large. The benefit in making use of an itera- 
tive scheme is that each iteration can be thought of as a 
discrete regularization parameter. Iterative methods are 
attractive since expensive matrix inverses are not calcu- 
lated directly, and a regularization parameter selection 
method can be used at each iteration to determine the 
optimal regularization strength (i.e., the optimal stop- 
ping iteration). Since the GCV and UPRE functions can 
be evaluated while iterations run, an optimally regular- 
ized solution can typically be found in the time needed 
to solve a system using a single value of the regulariza- 
tion constant with the semilinear method. This time sav- 
ings is important when using global optimization schemes 
since a large number of models need to be evaluated with 
these methods. 

Another attractive prospect of iterative methods is the 
fact that iterations can be carried out without explicit 
representations of the matrices themselves. For exam- 
ple, subroutines which perform the lensing and blurring 
operations can be substituted for B and L while pre- 
serving the least-squares form of the problem. There- 
fore, even large scale systems that would prohibit the 
direct application of the semilinear method in matrix 
form can be practically solved and op timally regular- 
ized b y an iterative K rylov rnethod (jRogers fc Fiegd 
l2011allH ). For example, lAlardI ()2009[ ) has modeled the 
cluster lens SL2SJ021408-053532, which is comprised of 
a small group of 6 galaxies and results in a set of large 
lensed arcs. As noted in that work, the large size of 
the system prevents direct application of the semilinear 
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method. However, iterative approaches with algorithmic 
substitutions for the explicit representations of the blur- 
ring and lensing matrices can accommodate large-scale 
lensing problems that are realistic for a number of prac- 
tical modeling situations. 

5. CONCLUSIONS 

We have used iterative Krylov methods to model a sub- 
set of the SLAGS lenses using GGV and the UPRE to 
select the optimal regularizing iteration. We addressed 
the problem of the number of effective degrees of freedom 
in the source by making use of parameter choice meth- 
ods that are commonly used in standard image decon- 
volut ion problems. Th is approach leads to a key result 
from lSuvu eTall (|2006( ) that was derived using Bayesian 
methods. The GGV and UPRE functions shed light on 
the concept of optimally regularized sources and provide 
an efficient method to select regularization parameters 
for iterative schemes. A non-negative bounded iterative 
algorithm is found to significantly improve the quality of 
the reconstructed sources. This approach provides non- 
negative solutions through linear optimization, which is 
significantly simpler to implement than other constrained 
opt imiz a tion techniques such as the maximum entropy 
method (jSkilling fc Brvanlll984l:[Wavth fc Web ster'20061) 
that require the use of more complicated non-linear op- 
timization schemes. 

The lens parameters recovered by the Ferret GA 
are similar t o prev iously published results found by 
iBolton et all (pOOl and we find consistency between 
analytical approximations to the source plane intensity 
based on a sum of Sersic profiles. We plan to investi- 
gate a larger sample of the SLAGS lenses in the future 
and explore other local optimization methods to solve 
the least-squares problem with a variety of regulariza- 
tion schemes. 
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Fig. 1. — A selection of SLACS gravitational lenses. The sources are non-negative and found using the MRNSD algorithm as the final 
polishingstep. The columns show the data d, image model, source model s and residual r respectively. The model parameters are given 
in table[l] Top row: SDSS J0037-0942, second row: SDSS J02f6-0813, bottom row: SDSS J0737-I-3216. Model images and sources in this 
figure were produced using an image pixel subsampling factor of 3. 
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Fig. 2.— Top row: SDSS J0912+0029, second row: SDSS J0956+5100, bottom row: SDSS 1402+6321. Model images and sources in this 
figure were produced using an image pixel subsampling factor of 3. 
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Fic;. 3. — Top row: Comparison of the difference between CGLS at each iteration k and Bayesian selected solutions using intensity 
(top left panel), gradient (top center) and curvature regularization (top right). Early iterations correspond more closely to gradient and 
curvature regularized solutions while the later iterations more closely approximate intensity regularization. The GCV and UPRE functions 
select the iteration marked by circles, and the intensity L-curve picks the under-regularized solution marked with a square. Bottom row: 
Selection methods as a function of iteration including the GCV function (bottom left panel), the UPRE function (bottom center) and a 
set of L-curves found by using the intensity, gradient and curvature norms of the CGLS solutions. In this case the curvature and gradient 
L-curves select the same solution as the GCV and UPRE functions. The iterations marked with triangles on the GCV plot represent 
over-regularized and under-regularized solutions, respectively. 
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Fig. 4. — Three solutions for SDSS J0216-0813 marked in the lower left-hand panel of Figure |3] These solutions correspond to over- 
regularized (left), critically-regularized (middle) and under-regularized solutions (right) as selected by the GCV function. Note the emphasis 
on back-traced noise in the under-regularized CGLS solution and the excessive smoothing of the over-regularized solution. Non-negative 
MRNSD solutions are shown on the second row. 
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Fig. 5. — An example of the lens parameter space mapping for J0737+3216 using the GA. Solutions within the la contour arc black, 2a 
mid gray and 3o" light gray. The top row shows the lens parameter space with a pixelated source (lens velocity dispersion a^, cUipticity q 
and lens orientation angle d), and the bottom row shows the lens parameter space for the analytical source. Note that the position of the 
best fit lens model changes when a pixelated source is used. The right column demonstrates the expected degenera<;y between the velocity 
dispersion and lens orientation angle. 



